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File 347:JAPIO Nov 1976-2005/ Jan (Updated 050506) 

(c) 2005 JPO & JAPIO 
File 350:Derwent WPIX 1963 - 2005/UD, UM &UP=200530 

(c) 2005 Thomson Derwent 

Set Items Description 

51 259755 ( PREDETERMIN? OR PRESET? OR PREESTABLISH? OR PREDEFIN? OR - 

PREARRANGED OR PRESCRIBED OR (PREVIOUSLY OR PRE) () (DETERMIN? - 
OR SET???? OR ESTABLISH? OR DEFIN? OR ARRANGED)) (5N) (VALUE? ? 
OR SCORE? ? OR NUMBER? ? OR NUMERAL? ?) 

52 28318 (DISTANCE? ? OR SIMILARITY) (7N) (VALUE? ? OR SCORE? ? OR NU- 

MBER? ? OR NUMERAL? ? OR FUNCTION? ?) 

53 3796 SI AND S2 

54 1429 S3(5N) (SMALLER OR MINIMAL OR MINIMUM OR LEAST OR LOWEST OR 

LOWER OR BELOW OR ABOVE OR (LESS OR MORE) () (THEN OR THAN) OR - 
GREATER OR HIGHER OR LARGER OR BIGGER OR MAXIMUM OR THRESHOLD? 
?) 

55 26049 (SUMMARY OR SUMMARIES OR SUMMARIZ? OR SUMMARIS? OR ABSTRAC- 

T? OR SYNTHES? OR SYNOPSI?) (5N) (STORY OR STORIES OR ARTICLE? ? 
OR DOCUMENT? ? OR PRESS () RELEASE? ? OR CONTENT OR INFORMATION 
OR DATA OR NEWS OR TEXT? ? OR CLIP? ? OR PAGE? ? OR WEBPAGE? 

? OR BROADC 

56 5 S4 AND S5 

57 125475 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 

??? OR CATEGORIZ? OR CATEGORIS?) (5N) (STORY OR STORIES OR ARTI- 
CLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? ? OR CONTENT OR INFO- 
RMATION OR DATA OR NEWS OR TEXT? ? OR CLIP? ? OR PAGE? ? OR W- 
EBPAGE? ? OR B 

58 384780 (BUFFER??? OR MEMORY OR RAM OR STACK OR QUEU????) (5N) (STORY 

OR STORIES OR ARTICLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? ? 
OR CONTENT OR INFORMATION OR DATA OR NEWS OR TEXT? ? OR CLIP? 
? OR PAGE? ? OR WEBPAGE? ? OR BROADCAST? ? OR TELECAST? ?) 

59 4529 S2(5N) (SMALLER OR MINIMAL OR MINIMUM OR LEAST OR LOWEST OR 

LOWER OR BELOW OR ABOVE OR (LESS OR MORE) () (THEN OR THAN) OR - 
GREATER OR HIGHER OR LARGER OR BIGGER OR MAXIMUM OR THRESHOLD? 
?) 

510 1123 S3 AND S9 

511 5 S10 AND S5 

512 0 Sll NOT S6 

513 74 S10 AND S7:S8 

514 24 S13 AND IC=G06F 

515 5591 (PREDETERMIN? OR PRESET? OR PREESTABLISH? OR PREDEFIN? OR - 

PREARRANGED OR PRESCRIBED OR (PREVIOUSLY OR PRE) () (DETERMIN? - 
OR SET???? OR ESTABLISH? OR DEFIN? OR ARRANGED)) (7N) (BUFFER? ? 
OR QUEUE? ?) 

516 16 S15 AND S2 

517 6398 (SIMILAR OR ANALAGOUS OR COMPARABLE OR EQUIVALENT) (3W) (STO- 

RY OR STORIES OR ARTICLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? 
? OR CONTENT OR INFORMATION OR DATA OR NEWS OR TEXT? ? OR CLI- 
P? ? OR PAGE? ? OR WEBPAGE? ? OR BROADCAST? ? OR TELECAST? ?) 

518 271 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 

??? OR CATEGORIZ? OR CATEGORIS?) (7N)S17 

519 6 S18 AND S5 

520 5 S18 AND S2 

521 4 S20 NOT S19 

522 12 S18 AND (SI OR S15) 



19/5/4 (Item 2 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 
(c) 2005 Thomson Derwent. All rts. reserv. 

012413614 **Image available** 

WPI Acc No: 1999-219722/199919 

XRPX Acc NO: N99-162560 

Document, e.g. book, paper, report, processing apparatus for producing a 
new, more readable summary by referring to and utilizing stored past 
text and group of summary - has summary unit which refers to 
summary sentence obtained by summary sentence acquisition unit and 
produces summary sentence of document obtained by document 
acquisition unit 

Patent Assignee: JUST SYSTEM KK (JUST-N) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 11053396 A 19990226 JP 97219301 A 19970729 199919 B 

Priority Applications (No Type Date) : JP 97219301 A 19970729 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 11053396 A 9 G06F-017/30 

Abstract (Basic) : JP 11053396 A 

NOVELTY - A summary unit refers to a summary sentence obtained by 
summary sentence acquisition unit and produces a summary sentence of 
the document obtained by a document acquisition unit. DETAILED 
DESCRIPTION - The document obtained by the document acquisition unit is 
in a predetermined format. A similar - document searching unit 
searches for the group of the summary of the obtained document , a 
similar past document, and a document from the database. The summary 

sentence acquisition unit obtains the summary sentence from the 
searched document' . INDEPENDENT CLAIMS are also included for the 
following: a memory medium which stores a document processing program; 
and a document processing method. 

USE - For producing a new, more readable summary by referring to 
and utilizing stored past texts and group of summary . 

ADVANTAGE - Has high accuracy and produces a summary which allows 
easy understanding of the contents of a document. DESCRIPTION OF 
DRAWING (S) - The figure is a block diagram showing the structure of the 
document processing apparatus. 

Dwg.l/6 

Title Terms: DOCUMENT; BOOK; PAPER; REPORT; PROCESS; APPARATUS; PRODUCE; 

NEW; MORE; READ; SUMMARY; REFER; STORAGE; PASS; TEXT; GROUP; SUMMARY; 

SUMMARY; UNIT; REFER; SUMMARY; SENTENCE; OBTAIN; SUMMARY; SENTENCE; 

ACQUIRE; UNIT; PRODUCE; SUMMARY; SENTENCE; DOCUMENT; OBTAIN; DOCUMENT; 

ACQUIRE; UNIT 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/30 
File Segment: EPI 



19/5/5 (Item 3 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 

(c) 2005 Thomson Derwent. All rts. reserv. 

012394513 **Image available** 

WPI ACC No: 1999-200620/199917 

XRPX Acc No: N99-148448 

Document processing apparatus for automatic production of summary to 
various books, papers and reports - produces summary of documents 
automatically for every similar document group , grouped by 
similar document group production unit 

Patent Assignee: JUST SYSTEM KK (JUST-N) 



Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 11045288 A 19990216 JP 97218229 A 19970729 199917 B 

Priority Applications (No Type Date) : JP 97218229 A 19970729 
Patent Details : 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 11045288 A 11 G06F-017/30 

Abstract (Basic) : JP 11045288 A 

NOVELTY - The summary of a document is produced automatically by 
a summary production unit for every similar document group 
grouped by similar document group production . unit . Similarity 
between the documents is computed by a similarity calculation unit with 
several documents of predetermined format acquired by a document 
acquisition unit. DETAILED DESCRIPTION - INDEPENDENT CLAIMS are 
included for the following: document processing method; a processing 
program memory medium 

USE - For automatic production of summary to various books, papers 
and reports . 

ADVANTAGE - Unifies summary of every similar group of 
documents offering convenience to read. DESCRIPTION OF DRAWING (S) - 
The figure shows block diagram of document processing apparatus. 

Dwg.l/ll 

Title Terms: DOCUMENT; PROCESS; APPARATUS; AUTOMATIC; PRODUCE; SUMMARY; 

VARIOUS; BOOK; PAPER; REPORT; PRODUCE; SUMMARY; DOCUMENT; AUTOMATIC; 

SIMILAR; DOCUMENT; GROUP; GROUP; SIMILAR; DOCUMENT; GROUP; PRODUCE; UNIT 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/30 
International Patent Class (Additional) : G06F-017/27 
File Segment: EPI 
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(c) 2005 European Patent Office 
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(c) 2005 WIPO/Univentio 
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S2 


23254 S1(5N) (STORY OR STORIES OR ARTICLE? ? OR DOCUMENT? ? OR PR- 




ESS () RELEASE? ? OR CONTENT OR INFORMATION OR DATA OR NEWS OR - 




TEXT? ? OR CLIP? ? OR PAGE? ? OR WEBPAGE? ? OR BROADCAST? ? OR 




TELECAST? ?) 


S3 


159828 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 




???" OR CATEGORI2? OR CATEGORIS?) (5N) (STORY OR STORIES OR ARTI- 




CLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? ? OR CONTENT OR INFO- 




RMATION OR DATA OR NEWS OR TEXT? ? OR CLIP? ? OR PAGE? ? OR W- 




EBPAGE? ? OR B 


S4 


155786 (BUFFER??? OR MEMORY OR RAM OR STACK OR QUEU????) (5N) (STORY 




OR STORIES OR ARTICLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? ? 




OR CONTENT OR INFORMATION OR DATA OR NEWS OR TEXT? ? OR CLIP? 




? OR PAGE? ? OR WEBPAGE? ? OR BROADCAST? ? OR TELECAST? ?) 


S5 


38150 S1(5N) (VALUE? ? OR SCORE? ? OR NUMBER? ? OR NUMERAL? ? OR - 




FUNCTION? ?) 


S6 


6449 S5(5N) (SMALLER OR MINIMAL OR MINIMUM OR LEAST OR LOWEST OR 




LOWER OR (LESS OR MORE) () (THEN OR THAN) OR GREATER OR HIGHER - 




OR LARGER OR BIGGER OR MAXIMUM OR THRESHOLD? ?) 


S7 


74408 (SUMMARY OR SUMMARIES OR SUMMARIZ? OR SUMMARIS? OR ABSTRAC- 




T? OR SYNTHES? OR SYNOPSI?) (5N) (STORY OR STORIES OR ARTICLE? ? 




OR DOCUMENT? ? OR PRESS () RELEASE? ? OR CONTENT OR INFORMATION 




OR DATA OR NEWS OR. TEXT? ? OR CLIP? ? OR PAGE? ? OR WEBPAGE? 




? OR BROADC 


S8 


27 0 S2 (SON) S3 :S4 (50N) S6 (50N) S7 


S9 


111 S8 AND IC=G06F 


S10 


58 S3:S4(10N)S6 


Sll 


38 S2(50N)S10 


S12 


37 Sll AND AY= (1970 :2002) /PR 


S13 


2415 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 




??? OR CATEGORIZ? OR CATEGORIS?) (7N)S5 


S14 


358 S13(20N)S6 


S15 


56 S14(100N)S7 


S16 


52 S15 NOT S10 


S17 


153516 (PREDETERMIN? OR PRESET? OR PREESTABLISH? OR PREDEFIN? OR - 




PREARRANGED OR PRESCRIBED OR (PREVIOUSLY OR PRE) () (DETERMIN? - 




OR SET???? OR ESTABLISH? OR DEFIN? OR ARRANGED)) (5N) (VALUE? ? 




OR SCORE? ? OR NUMBER? ? OR NUMERAL? ?) 
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9586 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 




??? OR CATEGORIZ? OR CATEGORIS?) (10N)S17 
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45 S6(50N)S18 
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37 S19 NOT (S10 OR S16) 
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960 S6(20N)S17 
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76 S22 NOT (S10 OR S16 OR S20) 
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12/3, K/10 (Item 10 from file: 348) 

DIALOG(R) File 348:EUROPEAN PATENTS 
(c) 2005 European Patent Office. All rts. reserv. 



01118380 

METHOD AND SYSTEM FOR RETRIEVING RELEVANT DOCUMENTS FROM A DATABASE 
METHODE UND VERFAHREN UM RELEVANTE DOKUMENTE IN EINER DATENBANK ZD" FINDEN 
PROCEDE ET SYSTEME POUR L ■ EXTRACTION DE DOCUMENTS PERTINENTS D ' UNE BASE DE 
DONNEES 

PATENT ASSIGNEE: 

KCSL, Inc., (2910941), Suite 1012, 5160 Yonge Street, Toronto, Ontario 
M2N 6L9, (CA) , (Proprietor designated states: all) 
INVENTOR : 

KAUFMAN, Ilia, 18 Brandy Court, Toronto, Ontario M3B 3L3, (CA) 
LEGAL REPRESENTATIVE: 

Boyce, Conor et al (74271), F. R. Kelly & Co., 27 Clyde Road, Ballsbridge 
, Dublin 4, (IE) 
PATENT (CC, No, Kind, Date) : EP 1086432 Al 010328 (Basic) 

EP 1086432 Bl 040407 
WO 1999064964 991216 
APPLICATION (CC, No, Date): EP 99924619 990607; WO 99CA531 990607 
PRIORITY (CC, No, Date) : US 88483 P 980608 

DESIGNATED STATES: AT; BE; CH; CY; DE ; DK; ES ; FI; FR; GB ; GR; IE; IT; LI; 

LU; MC; NL; PT; SE 
INTERNATIONAL PATENT CLASS: G06F-017/30 
NOTE: 

No A- document published by EPO 
LANGUAGE ( Publication , Procedural , Application) : English; English; English 
FULLTEXT AVAILABILITY: 



Available Text 


Language 


Update 


Word 




CLAIMS B 


(English) 


200415 


779 




CLAIMS B 


(German) 


200415 


731 




CLAIMS B 


(French) 


200415 


857 




SPEC B 


(English) 


200415 


6447 


Total 


word count 


- document 


A 


0 


Total 


word count 


- document 


B 


8814 


Total 


word count 


- documents A + B 


8814 



SPECIFICATION method of the invention takes into account the 
distribution of query-words in a document, a candidate document will 
receive a higher similarity score when the document includes a 
large concentration, or clustering , of query-words. This renders the 
method of the invention relatively immune to isolated and sporadic 
occurrences . . . 



12/3,K/12 (Item 12 from file: 348) 

DIALOG (R) File 348:EUROPEAN PATENTS 
(c) 2005 European Patent Office. All rts. reserv. 

00781211 

MULTI -LAYER INFORMATION STORAGE SYSTEM 
MEHRSCHICHTINFORMATIONSSPEICHERSYSTEM 
SYSTEME MULTICOUCHE DE STOCKAGE D 1 INFORMATION 

PATENT ASSIGNEE: 

Koninklijke Philips Electronics N.V. , (1489041), Groenewoudseweg 1, 5621 
BA Eindhoven, NL\ (Proprietor designated states: , AT; BE; DE; FR; GB; 
IT; SE) 

PHILIPS NORDEN AB , (221813), Kottbygatan 5, Kista, 164 85 Stockholm, 
SE\ (Proprietor designated states: , SE) 
INVENTOR : 

COOPS, Peter, Groenewoudseweg 1, NL-5621 BA Eindhoven, (NL) 
HEEMSKERK, Jacobus, Petrus, Josephus , Groenewoudseweg 1, NL-5621 BA 
Eindhoven, (NL) 

VISSER, Derk, Groenewoudseweg 1, NL-5621 BA Eindhoven, (NL) 



HOLT SLAG, Antonius, Hendricus, Maria, Groenewoudseweg l, NL-5621 BA 
Eindhoven, (NL) 
LEGAL REPRESENTATIVE: 

Visser, Derk et al (75441) , Philips Intellectual Property & Standards 
P.O. Box 220, 5600 AE Eindhoven, (NL) 
PATENT (CC, No, Kind, Date) : EP 729629 Al 960904 (Basic) 

EP 729629 Bl 031105 
WO 96006427 960229 
APPLICATION (CC, No, Date): EP 95927046 950816; WO 95IB648 950816 
PRIORITY (CC, No, Date) : EP 94202416 940823; US 299861 940901 
DESIGNATED STATES: AT; BE; DE; FR; GB; IT; SE 
INTERNATIONAL PATENT CLASS: G11B-007/00 
NOTE: 

No A- document published by EPO 
LANGUAGE ( Publication , Procedural , Application) : English; English; English 
FULLTEXT AVAILABILITY: 

Available Text Language Update Word Count 



CLAIMS B (English) 200345 591 

CLAIMS B (German) 200345 488 

CLAIMS B (French) 200345 663 

SPEC B (English) 200345 6544 

Total word count - document A 0 

Total word count - document B 8286 

Total word count - documents A + B 8286 



.SPECIFICATION layers with a single spherical aberration compensation may 
be advantageously combined with the feature of the minimum distance of 
the information layers. A decease of the minimum distance increases 
the number of information layers that fit in a stack of a certain 
thickness. Hence, such a decrease increases the information density of 
the record carrier and. . . 



12/3, K/13 (Item 13 from file: 348) 

DIALOG (R) File 348: EUROPEAN PATENTS 

(c) 2005 European Patent Office. All rts. reserv. 

00772993 

Method of identifying similarities in code segments 

Verfahren zur Identif izierung von Gleichartigkeiten zwischen Codesegmenten 
Methode d * identif ication de similitudes entre des segments de code 

PATENT ASSIGNEE: 

AT&T Corp., (589370), 32 Avenue of the Americas, New York, NY 10013-2412, 
(US), (applicant designated states: DE;FR;GB) 
INVENTOR : 

Goodnow II, James E., 12477 Old Mine Road, Grass Valley, California 95945 
, (US) 

Helfman, Jonathan I., 151 Riverview Avenue, Gillette, New Jersey 07933, 
(US) 

Kowalski, Thaddeus J., 73 Stoneridge Road, Summit, New Jersey 07901, (US) 
Puttress, John J., 75 Elkwood Avenue, New Providence, New Jersey 07974, 
(US) 

Rowland, James R . , 18 Thackeray Drive, Short Hills, New Jersey 07078, 
(US) 

Seaquist, Carl R. , 1154 Terrace Acres Drive, Auburn, Alabama 36830, (US) 
LEGAL REPRESENTATIVE: 

Buckley, Christopher Simon Thirsk et al (28912) , Lucent Technologies, 5 
Mornington Road, Woodford Green, Essex IG8 0TU, (GB) 
PATENT (CC, No, Kind, Date) : EP 723224 Al 960724 (Basic) 
APPLICATION (CC, No, Date): EP 96300183 960110; 
PRIORITY (CC, NO, Date) : US 373342 950117 
DESIGNATED STATES: DE ; FR ; GB 
INTERNATIONAL PATENT CLASS: G06F-011/00; 
- ABSTRACT WORD COUNT: 151 



LANGUAGE ( Publication , Procedural , Application) : English; English; English 
FULLTEXT AVAILABILITY: 

Available Text Language Update Word Count 

CLAIMS A (English) EPAB96 672 

SPEC A (English) EPAB96 4076 
Total word count - document A 4 74 8 

Total word count - document B 0 
Total word count - documents A + B 4748 

...SPECIFICATION applied. The intensity of block 330 represents the data 
values. Illustratively, the darker the block 330, the larger the data 
value . 

A separate distance function 0( sub(D)) is used to compute 
similarity measurements between pairs of operators from different 
functions as . . . 

12/3, K/14 (Item 14 from file: 348) 

DIALOG (R) File 348: EUROPEAN PATENTS 
(c) 2005 European Patent Office. All rts . reserv. 

00684901 

Analyzing an image to obtain a stable number of groups 
Bildanalyse zur Emittlung einer stabilen Anzahl von Gruppen 
Analyse d 1 image afin d'obtenir un nombre de groupes stable 

PATENT ASSIGNEE: 

XEROX CORPORATION, (219783), Xerox Square, Rochester, New York 14644, 
(US), (Proprietor designated states: all) 
INVENTOR : 

Mahoney, James V., 1245 Kearny Street, Apt.2B, San Francisco, CA 94133, 
(US) 

Rao, Satyajit, 550 Memorial Drive, Apt. 17A2, Cambridge, MA 02139, (US) 
LEGAL REPRESENTATIVE: 

Grunecker, Kinkeldey, Stockmair & Schwanhausser Anwaltssozietat (100721) 
, Maximilianstrasse 58, 80538 Munchen, (DE) 
PATENT (CC, No, Kind, Date) : EP 654752 A2 950524 (Basic) 

EP 654752 A3 951206 
EP 654752 Bl ' 020313 
APPLICATION (CC, No, Date): EP 94308655 941123; 
PRIORITY (CC, No, Date) : US 158053 931124 
DESIGNATED STATES: DE ; FR; GB 
INTERNATIONAL PATENT CLASS: G06K- 009/20 
ABSTRACT WORD COUNT: 24 6 
NOTE : 

Figure number on first page: 1 

LANGUAGE ( Publ icat ion , Procedural , Appl icat ion) : English; English; English 
FULLTEXT AVAILABILITY: 

Available Text Language Update Word Count 

CLAIMS A (English) EPAB95 1943 

CLAIMS B (English) 200211 2650 

CLAIMS B (German) 200211 2712 

CLAIMS B (French) 200211 3056 

SPEC A (English) EPAB95 12837 

SPEC B (English) 200211 13069 

Total word count - document A 14 782 

Total word count - document B 214 8 7 

Total word count - documents A + B 36269 

...SPECIFICATION Fig. 4 finds a threshold that produces a stable number of 
groups by iteratively applying thresholds to distances data , with 
each iteration incrementing the threshold. The approach of Fig. 5 finds a 
threshold that produces a stable number of groups by iteratively applying 
thresholds to the distances data , with each iteration increasing the 
threshold by a difference between distances. The approach of Fig. 6 finds 



a threshold that produces a stable number of groups by using 
distances data to obtain differences between distances that occur, 
and by then using the largest of the differences to obtain a threshold. 

In Fig... types of grouping. The act in box 356 uses the values from box 
352 to obtain a threshold that would produce thresholded distance 
data defining a number of groups . The number of groups is stable 
across a larger range of thresholds than another number of groups... 

.Fig. 11. 

The act in box 356 uses the data image from box 350 to obtain a 
distances data image in which each pixel is labeled with a distance to 
a near neighbor; in the data... 

.SPECIFICATION Fig. 4 finds a threshold that produces a stable number of 
groups by iteratively applying thresholds to distances data , with 
each iteration incrementing the threshold. The approach of Fig. 5 finds a 
threshold that produces a stable number of groups by iteratively applying 
thresholds to the distances data , with each iteration increasing the 
threshold by a difference between distances. The approach of Fig. 6 finds 
a threshold that produces a stable number of groups by using 
distances data to obtain differences between distances that occur, 
and by then using the largest of the differences to obtain a threshold. 

In Fig... types of grouping. The act in box 3 56 uses the values from box 
352 to obtain a threshold that would produce thresholded distance 
data defining a number of groups . The number of groups is stable 
across a larger range of thresholds than another number of groups . . . 

.Fig. 11. 

The act in box 356 uses the data image from box 350 to obtain a 
distances data image in which each pixel is labeled with a distance to 
a near neighbor, in the data... 

. CLAIMS meet a foreground neighbor border criterion, a distance to a near 
neighbor in the initial array; the distance data indicating a 
distance to a near neighbor for each value item in the initial array; 
the act of using the threshold data to obtain grouping data further 
comprising : 

grouping value items together that have distances that are 
below the threshold , or grouping value items together that have 
distances that are above the threshold , or in which the gap data 
indicate, for value items in the initial array that meet an. . . 

.neighbor border criterion, a distance to a near neighbor in the 

complement of the initial array; the distance data indicating a 
distance to a near neighbor in the complement of the initial array 
for each value item in the initial array; the act of using the 
threshold data to obtain grouping data further comprising: 

grouping value items together that have distances that are 
above the threshold . 
12. A method of operating a machine that includes: 

a processor (66) connected for accessing a memory. . . 

. CLAIMS to 5 in which the gap data indicate, for each value item in the 
initial array, a distance ; the threshold data indicating a 
threshold gap value; the act of using the threshold data to obtain 
grouping data comprising: 
comparing (274) the threshold gap value with each (262, 270, 272) 
distance indicated by the gap data . 
7. The method of one of claims 1 to 6 in which the reference criterion 
requires a. . .meet a foreground neighbor border criterion, a distance 
to a near neighbor in the initial array; the distance data 
indicating a distance to a near neighbor for each value item in the 
initial array; the act of using the threshold data to obtain grouping 
data further comprising: 
grouping value items together that have distances that are below 



the threshold or 

grouping value items together that have distances that are above the 
threshold or in which the gap data indicate, for value items in the 
initial array that meet an initial array; the distance data 
indicating a distance to a near neighbor in the complement of the 
initial array for each value item in the initial array; the act of 
using the threshold data to obtain grouping data further 
comprising: 

grouping value items together that have distances that are above 
the threshold . 

22. A machine comprising: 
memory (68; 192, 194; 52 0, 54 0) for storing data; and 
a processor (66. . . 



12/3, K/16 (Item 16 from file: 348) 

DIALOG (R) File 34 8: EUROPEAN PATENTS 

(c) 2005 European Patent Office. All rts . reserv. 

00488004 

Apparatus and method for determining and displaying the difference between 

two technical drawings 
Gerat und Verfahren zur Ermittlung und Darstellung von Unterschieden 

zwischen zwei technischen Zeichnungen 
Dispositif et procede pour la determination et la presentation des 

differences entre deux schemas techniques 
PATENT ASSIGNEE: 

KABUSHIKI KAISHA TOSHIBA, (213130), 72, Horikawa-cho, Saiwai-ku, 

Kawasaki-shi , Kanagawa-ken 210, (JP) , (applicant designated states: 
DE;FR;GB) 
INVENTOR : 

Doi, Miwako, c/o Intellectual Property Div. , Toshiba Corporation, 1-1-1, 

Shibaura, Minato-ku, Tokyo, (JP) 
Fukui, Mika, c/o Intellectual Property Div., Toshiba Corporation, 1-1-1, 

Shibaura, Minato-ku, Tokyo, (JP) 
Okazaki, Akio, c/o Intellectual Property Div., Toshiba Corporation, 

1-1-1, Shibaura, Minato-ku, Tokyo, (JP) 
Numagami, Hideo, c/o Intellectual Property Div., Toshiba Corporation, 

1-1-1, Shibaura, Minato-ku, Tokyo, (JP) 
Okamoto, Yasukazu, c/o Intellectual Property Div., Toshiba Corporation, 

1-1-1, Shibaura, Minato-ku, Tokyo, (JP) 
Tsuboi, Hiroyuki, c/o Intellectual Property Div., Toshiba Corporation, 

1-1-1, Shibaura, Minato-ku, Tokyo, (JP) 
Hirakawa, Hideki, c/o Intellectual Property Div., Toshiba Corporation, 

1-1-1, Shibaura, Minato-ku, Tokyo, (JP) 
Kurosawa, Yuuichi, c/o Intellectual Property Div., Toshiba Corporation, 
1-1-1, Shibaura, Minato-ku, Tokyo, (JP) 
LEGAL REPRESENTATIVE: 

BATCHELLOR, KIRK & CO. (100991) , 2 Pear Tree Court Farringdon Road, 
London EC1R 0DS, (GB) 
PATENT (CC, No, Kind, Date) : EP 478315 A2 920401 (Basic) 

EP 478315 A3 930609 
EP 478315 Bl 960814 
APPLICATION (CC, No, Date) : EP 91308758 910925; 
PRIORITY (CC, No, Date) : JP 90255074 900927 
DESIGNATED STATES: DE ; FR; GB 
INTERNATIONAL PATENT CLASS: G06F-017/30; 
ABSTRACT WORD COUNT: 186 

LANGUAGE ( Publication , Procedural , Application) : English; English; English 
FULLTEXT AVAILABILITY: 

Available Text Language Update Word Count 
CLAIMS B (English) EPAB96 1501 
CLAIMS B (German) EPAB96 12 5 9 



CLAIMS B (French) EPAB96 1833 

SPEC B (English) EPAB96 4568 
Total word count - document A 0 
Total word count - document B 9161 
Total word count - documents A + B 9161 

...SPECIFICATION correspondence analysis section of the apparatus of 
Figure 3 ; 

Figure 10 shows a matrix for determining a maximum similarity 
value in the logical information memory section of the apparatus of 
Figure 3 ; 

• Figure 11A and 11B show an example of a node... 



12/3, K/25 (Item 2 from file: 349) 

DIALOG (R) File 349:PCT FULLTEXT 
(c) 2005 WIPO/Univentio. All rts . reserv. 



01083293 **Image available** 

METHOD AND APPARATUS FOR CLASSIFICATION OF A DATA OBJECT IN A DATABASE 
PROCEDE ET APPAREIL DE CLASSIFICATION D 1 UN OB JET DE DONNEES DANS UNE BASE 
DE DONNEES 

Patent Appl i cant /Ass ignee : 

KONINKLIJKE PHILIPS ELECTRONICS N V, Groenewoudseweg 1, NL-5621 BA 
Eindhoven, NL, NL (Residence) , NL (Nationality) , (For all designated 
states except: US) 
Patent Appl icant/ Inventor : 

BODLAENDER Maarten P, c/o Prof. Holstlaan 6, NL-56 56 AA Eindhoven, NL, NL 
(Residence), NL (Nationality), (Designated only for: US) 
Legal Representative: 

GROENENDAAL Antonius W M (agent), Philips Intellectual Property & 
Standards, Prof. Holstlaan 6, NL-5656 AA Eindhoven, NL, 
Patent and Priority Information (Country, Number ,. Date) : 

Patent: WO 200406128 A2-A3 20040115 (WO 0406128) 

Application: WO 2003IB2911 20030627 (PCT/WO IB03002911) 

Priority Application: EP 200277765 20020709 
Designated States : 

(Protection type is "patent" unless otherwise stated - for applications 
prior to 2004) 

AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ 
EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR 
LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD 
SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW 

(EP) AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE 
SI SK TR 

(OA) BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG 

(AP) GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW 

(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 4591 



Fulltext Availability: 
Detailed Description 

Detailed Description 

. . . have multiple classification parameters associated with them. In that 
case, a data object is sorted in multiple groups . 

When the data objects have been grouped per equal value of at least 
one 

classification parameter, similarity of data objects with equal 
values of the classification parameter is identified in a process step 
404. The process step 404 comprises two... 



12/3, K/26 (Item 3 from file: 349) 

DIALOG (R) File 34 9:PCT FULLTEXT 
(c) 2005 WIPO/Univentio . All rts . reserv. 

01022586 **Image available** 
INFORMATION RESOURCE TAXONOMY 
TAXINOMIE DE RESSOURCES D ' INFORMATIONS 

Patent Appl icant /Assignee : 

TELSTRA NEW WAVE PTY LTD, ACN 070 562 93 5, 242 Exhibition Street, 

MELBOURNE, Victoria 3000, AU, AU (Residence), AU (Nationality), (For 

all designated states except: US) 
Patent Appl icant/ Inventor : 

RYAN Simon David, 11 Sandgate Avenue, GLEN WAVERLEY, Victoria 3150, AU, 

AU (Residence), AU (Nationality), (Designated only for: US) 
RASKUTTI Bhavani, 4 Empress Road, SURREY HILLS, Victoria 3127, AU, AU 

(Residence), AU (Nationality), (Designated only for: US) 
PHIET Do Quang, 25 Andleigh Drive, MULGRAVE, Victoria 3170, AU, AU 

(Residence), AU (Nationality), (Designated only for: US) 
SEMBER Peter Paul, 22 Neville Street, CARNEGIE , Victoria 3163, AU, AU 

(Residence), AU (Nationality), (Designated only for: US) 
Legal Representative : 

DAVIES COLLISON CAVE (agent) , WEBBER, David, Brian, PRYOR, Geoffrey, 

Charles, LESLIE, Keith, 1 Little Collins Street, MELBOURNE, Victoria 

3000, AU, 

Patent and Priority Information (Country, Number, Date) : 

Patent: WO 200352627 Al 20030626 (WO 0352627) 

Application: WO 2002AU1719 20021218 (PCT/WO AU0201719) 

Priority Application: AU 20019589 20011218 

Designated States: 

(Protection type is "patent" unless otherwise stated - for applications 
prior to 2004) 

AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ 
EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR 
LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG 
SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW 

(EP) AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK 
TR 

(OA) BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG 

(AP) GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW 

(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 4012 

Fulltext Availability: 
Detailed Description 
Detailed Description 

. . . numeric similarity measure is then determined as a fimction of any two 
word vectors to determine the similarity of any two documents . For 
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similarity falls within a threshold similarity value for 
clustering. Once formed, a cluster is characterised ...measure used is 
the cosine similarity fimction, described in the TACT specification. The 
clustering process uses this similarity measure to group similar 
documents into clusters by assigning each document to the most 
similar cluster . An optimal similarity threshold value for 
creating clusters from a given document set is determined by creating 
different groupings of the documents at different thresholds and then 
evaluating these... this effect. In the first process, the coherence of 
the clusters is maintained as the number of documents n increases by 
reducing the similarity threshold with increasing n. In the second 
process, a new random sample better representing the population is... 



.the optimality of the existing clusters and/or as a means for 



determining a new quasi-optimal similarity threshold value for 
subsequent re- clustering of the document space to improve accuracy. 

To reduce the time required by the search for an optimal or quasi... 
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probability function for each line from said 
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... applied to this limiting value. Such aggregate functions especially 
comprise the mean pairw ( inverted exclamation mark) se distance of the 
data set to all other data sets in the cluster which may be the 
arithmetic mean distance, the... 

. . .distance defined in another way. A ftu-ther example of such an aggregate 
function is a median distance of a data set to all other data sets, 
Le. the distance separating the lower 50% of the distance values 
from the re . 

maining 50%, the latter lying above this value. One may also think of 
generalising . . . 
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Detailed Description 

Detailed Description 

... 322. Data comparator 320 has a first data input to receive a 

plurality of similarity values from similarity matrix 310 and a second 
data input to receive threshold criteria data 312. Data comparator 320 
is 

executable to compare each of the plurality of similarity values of 

similarity matrix 310 with threshold criteria data 312, preferably 
as 

described below. 



Cluster assignor 322 is executable to generate cluster data 314 in 
connection with data comparator '320. Cluster data. . . 
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method, system and. . . 

...the keywords in a multilingual document that are translated and keywords 
identified in OCRed images of electronic documents ; (f) are adapted to 
summarize search results. 

BRIEF DESCRIPTION OF DRAWINGS 

These and other aspects of the invention will become apparent from. . . 
the results are excessive (i.e., over a predefined limit) or are 
insufficient (i.e., under a predefined number ) that have distance 
measurements within the preset threshold value , then act 416 is 
performed; otherwise, act 418 is performed. 

At 416, if there exists more than... 

...electronic document, and is adapted to return a set of documents 

(including their locations (e.g., URLs), summaries , text content , 
applied services) that includes documents similar (i.e., matches, 
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information stream and the reference coordinate value. 
In one embodiment of the invention, the condition. . . 

. . .at least one audio information stream corresponding to at least one 
related information stream having a coordinate value within a 
prescribed range from a reference coordinate value which is obtained 
based on the plurality of coordinate values , based on a distance 
between the coordinate value included in the at least one related 
information stream and the reference coordinate value. 

In one embodiment of the invention, the step. . .The audio information 
streams stored in the audio information database 40 are each provided 
with an audio information stream number. An index abstract , or the 
like of the contents of each audio information stream may also be stored 
as a . . . 
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SUMMARY OF THE INVENTION 
It is therefore an object of the present invention to provide a* system 
in... of similarity for users 2. and 3 are obtained as 0 and 6.6, 
respectively. 

When the similarity exceeds a predetermined delivery threshold 
value , the text is delivered to the user associated with the pertinent 
retrieval condition. Since the threshold value ... than the delivery 
threshold value are additionally delivered in the similarity descending 
order. Resultantly, even when the number of texts of which similarity 
exceeds the delivery threshold value set by the user is less than 
that of texts deaired by the user, a predetermined number of texts 
can be additionally delivered to the user. Therefore, when no text is 
delivered to the. . . 
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factor or higher together with their similarity factor, document 
numbers , position numbers in document and the like, or simply display 
only predetermined numerals . The order of display may be the sequence 
of appearance in the document (s) or in the... 
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SG SK SL TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW 
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SI SK TR 

(OA) BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG 
(AP) GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW 



(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 3805 

Main International Patent Class: G06F-017/30 
Fulltext Availability: 
Detailed Description 

Detailed Description 

... herein as the subject article. 

In step 306, distiller 104 extracts the textual body of the subject 
article . The title, abstract , figures, and other metadata of the 
subject article are discarded. This prevents the metadata from 
influencing the. . . 

...while high proximity scores represent terms generally appearing 
distanced from one another. 

In an alternative embodiment, proximity scores 906 are calculated as 
some predetermined number , e.g., twenty-five, minus the distance 
between terms as a number of terms and is never less than one if 
the terms appear in the same language unit, e.g., in the same sentence 
Thus ... 
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Detailed Description 

. . . program product and a device that 

enable a user to arrange a plurality of such documents 
or data sets in a simple way. 

Summary of the Invention 
The above object is achieved wholly or pdrtially by 
a method according to claim. . . on 

the imaginary surface between pairs of coordinates for 
two points recorded directly after each other is less 

than a certain predetermined distance value . A physically 
continuous line can thus he discontinuous if, when it is 
recorded in electronic form, it ... detected when the distance on the 
imaginary surface 

between two points recorded directly after each other is 
larger than said predetermined distance value . The 
predetermined distance value can be selected depending 

upon which type of area is being recorded. An 

electronically recorded line comprising. . . 
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Detailed Description 

algorithm can add the total value .of composite features found in the 
text segments and compare this value against a similarity threshold 
. Similarly, although it is preferred to determine feature values based 
on the use of a machine learning algorithm, feature values can be 
predetermined based on human experience through the use of a look-up 
table. Alternatively, all features can be... 

...determining similarity in small text segments described 

herein form an important component in larger systems, such as document 
archiving systems and multi- document summarization systems. 

Although the present invention 
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... No. 5,694,593. Assigned to Northeastern University, Boston, MA. 

3 K. Baclawski and D. Simovici. An abstract model for semantically rich 
information retrieval. Technical report, Northestern University, 

Boston, MA, March 1994. 

4 A. Campbell and S. Shapiro. Algorithms for... with the highest 
similarity in each target ontology are returned. In another embodiment 
all objects which generate similarity values greater than a 
predetermined value are considered sufficiently similar to the query 
to be returned to the user as relevant information. 

Once. . . 
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Detailed Description 

... also features and metadata. The data model used for such a database 
can support the representation of information at many levels of 
abstraction , including. 

1 5 

1. The data representation level, which contains the actual data of the 
information object. 

2. The data object level, which ... Computer Science, Northeastern 
University, Boston, MA, 1997. 

4. P. Hayes and J. Carbonell. Scout - automated query-relevant document 
summarization . Technical Report 1997 Project Summary , Carnegie Group, 

Pittsburgh, PAv 1997. 

5. Y. Ohta . Knowledge -Based Interpretation of Outdoor Natural Color 
Scenes . 

Pitman... 3 service can be requested. Then, based on the measure of 
similarity, the implementation can return a predetermined number , N. 
of objects with the highest similarity, or, alternatively, all objects 
that generate similarity values greater than a predetermined 
value , which are considered sufficiently similar to the query to be 
returned to the user as relevant information... 
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ABSTRACT 

PURPOSE: To improve recognition accuracy for an object by narrowing down or 
supplementing the recognition information of the object by unifying various 
kinds of information. 

CONSTITUTION: The first similarity of an object 1 to be detected for each 
object in an object group is computed from first detection data D by a 
CCD camera 2 and natural quantity stored in memory 4 in advance and 
detected by the CCD camera 2 in an object 1 group. One or more candidates 
similar to the object 1 to be detected are extracted from the object group 
based on the first similarity. Second similarity is computed from second 
detection data D by a scale 3 and the natural quantity stored in the 
memory 4 in advance for each candidate and detected by the scale 3 . An 
information unifying part 7 couples the first similarity with the second 
similarity for each candidate by using, for example, Dempsf er-Shaf er 
binding rule, and one of the maximum values for coupled similarity is 
selected and displayed on a display part 8 . 
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ABSTRACT 

PURPOSE: To provide the character recognition device which can most 
suitably and stably determine the proper limit of error of candidates with 
respect to the recognition result of a character pattern. 



CONSTITUTION: Category information and distance value information 
having smaller values among distance values between a feature 

vector of a character pattern and standard vectors in character category- 
units are outputted from a discriminating part 3. A similar category group 
number to which category information among these outputs belongs is 
determined by a coefficient selecting part 16, and the discrimination 
coefficient value of each similar category group for distinction between 
the inclination of correct read -and that of erroneous read, distance 
value information , and difference value data where the distribution 
form of the whole of a distance value string is noticed are led to a 
candidate probability calculating part 18, and the limit of error of 
candidates are derived from inner products between the discrimination 
coefficient value and plural difference value data. 
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ABSTRACT 

PURPOSE: To output the required number of graphics from the graphic with a 
highest similarity degree in sequence by calculating distance between 
respective kinds of graphic data and examplified graphic data and 
displaying the image of the required number of the graphics from one with 
short distance in sequence. 

CONSTITUTION: The number of the distance unit necessary for the one change 
of the change of intra-respective graphic elements and the change of intra- 
respective connection relations concerning graphic data is previously set 
and a distance operation means 10 compares the respective kinds of 
graphic data stored in a storage data memory 3 with graphic data stored 
in a retrieve condition memory 6 so as to calculate how many distance 
unit has to be changed in order to be mutually coincident graphic data. A 
retrieving means 7 copes the calculated number of the distance unit with 
the respective kinds of graphic data in the storage data memory 3 and 
decides the required number of graphic data inputted by a retrieve 
condition input means 4 from the minimum number of the unit in sequence. 
Furthermore, a retrieve result output means 8 reads the graphic image 
corresponding to the graphic data from the minimum number of the 
distance unit in sequence and displays it. 
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ABSTRACT 

PURPOSE: To include a correct recognized result and to more efficiently 
contract the number of candidates by generating the threshold to contract 
the candidates by the maximum value and the minimum value of the 
distance of a pattern inputted to a contraction function and a standard 
pattern . 

CONSTITUTION: The distance of respective standard vector information 

groups of the pattern information of input information extracted by an 
extracting means 2 is calculated by a calculating means and based on the 
minimum distance, the threshold is determined by a threshold determining 
part 8. Thereafter, the distance group calculated by the determined 
threshold is contracted and the candidate group based on the standard 
vector corresponding with selection oucputting means 7 and 9 is outputted. 
Thus, when the information to be a recognizing object is recognized and a 
candidate is outputted, the correct recognized result can be included in a 
small number of output candidates with high probability. By generating the 
threshold contracting the candidates by the maximum value and the 
minimum value of the distance of the pattern inputted to the 

contraction function and the standard pattern, the correct recognized 
result is included and the contraction of the number of candidates is made 
more effective. 
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Abstract (Basic) : JP 2002230012 A 

NOVELTY - A calculator calculates the similarity between each 
document in a document group . Another calculator calculates a 
similarity threshold value , based on the similarity between each 
document . A clustering unit performs the clustering of the 
document group , based on the calculation results. 
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USE - For customer management in enterprise. 

ADVANTAGE - The representation document of each cluster is 
performed simply and quickly. 

DESCRIPTION OF DRAWING (S) - The figure shows a block diagram of the 
document clustering system. (Drawing includes non-English language 

text) . 
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Data analysis method involves selecting minimum value indicating 
similarity between input data as reference value for converting data 
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Abstract (Basic) : JP 2002024206 A 

NOVELTY - The values indicating similarity between each data in 
an input data group are calculated. The minimum calculated value is 
selected and used as reference for converting the data in the data 
group . 

USE - For analysis of base sequence data group and amino acid 
data group . » 
ADVANTAGE - The similarity of data in a data group can be 

determined effectively. 

DESCRIPTION OF DRAWING (S) - The figure shows a flowchart explaining 
data analysis procedure. (Drawing includes non-English language text). 
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Abstract (Basic) : EP 1170674 A2 

NOVELTY - The minimum distance of a pair of data sets and the 
distance between data sets of a pair of clusters are less than a 
determined lowest limiting value, if a -determined difference of the 
distance of a data set pair and specified distance is greater than 
or equal to 0. The maximum distance of the pair and the distance 
between the data sets are greater than a determined highest limiting 
value, if the difference is less than or equal to 0. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(a) Electronic data set ordering apparatus and apparatus operation 
method ; 

(b) Database and computer program 

USE - For database management in computer . 

ADVANTAGE - The maximum number of data levels is limited, thus the 
simple data structure is achieved. 

DESCRIPTION OF DRAWING (S) - The figure shows the sample of 
graphical display of documents. 
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Abstract (Basic) : JP 2000076510 A 

NOVELTY - A distinction unit (10) is provided to distinguish a 
target pattern by collating the target pattern with a reference 
pattern. The distinction unit includes a rewritable memory (11) which 
stores threshold value data corresponding to a similarity in 
pattern distinction. A communication controller (12) rewrites and 
controls the threshold value data based on an external indication. 

USE - For distinction of coins. 

ADVANTAGE - Enables suitable modification of threshold value data, 
and enables setting of suitable distinction capability. Reduces 
modification cost and shortens modification period. DESCRIPTION OF 
DRAWING (S) - The figure shows the block diagram of pattern detecting 
device. (10) Distinction unit; (11) Rewritable memory; (12) 
Communication controller. 
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Aural data search apparatus in multimedia communication - outputs input 
search aural data or corresponding attribute information , when computed 

similarity in extracted characteristics of input search and key aural 
data, is more than preset value 
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Abstract (Basic) : JP 11282857 A 

NOVELTY - A calculator (le) computes the similarity in extracted 
characteristics of input search and key aural data. An output unit (If) 
outputs the input search aural data or the corresponding attribute 
information , when the computed similarity is more than 
predetermined value . DETAILED DESCRIPTION - The aural data searched 
from a memory (3a) of server (3) , are input to input unit (la) , 



through a network (2) . Key aural data are input to another input unit 
(lc) . An INDEPENDENT CLAIM is also included for software for aural data 
searching . 

USE - For searching aural data in multimedia communication. 

ADVANTAGE - Since the similarity in extracted characteristics of 
input search aural data and key aural data is computed, desired 
speaker's aural data are acquirable from database. Enables searching of 
aural data even without knowing speaker's name. DESCRIPTION OF 
DRAWING (S) - The figure shows theoretical diagram explaining the 
principle involved in aural data searching. (la,lc) Input units; (le) 
Calculator; (If) Output unit; (2) Network; (3) Server; (3a) Memory. 
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Abstract (Basic) : JP 11045254 A 

NOVELTY - A search server (103) converts the search condition input 
from a client (100) into vector expression and searches out the 
corresponding sub-document from a transposition file (102) based on the 
vector expression of input search conditions. DETAILED DESCRIPTION - A 
document (DB101) is divided into groups each of which consists of 
arbitrary number of sentences. The transposition file (102) defines the 

divided group as a sub- document , converts the sub-document into 
a vector expression, and stores in a predetermined unit. The search 
server (103) compares the similarity of stored vector expression of 
sub-document, and vector expression of input search conditions. The 
sub- documents whose similarity exceed a predetermined threshold 
value are selected, and listed. An INDEPENDENT CLAIM is included for a 
recording medium storing program for operating a computer to search 
documents . 

USE - None given. 

ADVANTAGE - Obtains description in the document relevant to the 
input search conditions directly, and utilizes the search result 
effectively. Documents containing some other topics are also searched 
reliably. Eases solution of desired sub-document. DESCRIPTION OF 
DRAWING (S) - The drawing shows the system block diagram of the document 
searching apparatus. (100) Client; (102) Transposition file; (103) 



Search' server. 
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Abstract: The competitive learning algorithm yields a fixed number of 
categories as it forms groups with the n-dimensional objects of the 
workspace. A way to establish how good these categories are is to consider 
the number of objects that belong to the class and the mean distance from 
them to the class prototype. These values ascertain the category's 
generality and similarity. Depending on a certain particular problem, it 
may occur that the only interesting categories are those that meet some 
preset values for generality and similarity . In this article , a 
clustering algorithm based on competitive learning is proposed, in which 
the clustering process is repeated for those objects that do not satisfy 
the above conditions, while saving the categories that do. Due to this 
iterative process, the formed groups finally meet the required values of 
generality and similarity. (Author abstract) 6 Refs. 
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Abstract: The organization of large text collections is the main goal of 
automated text categorization In particular, the final aim is to 

classify documents into a certain number of pre - defined categories in 
an efficient way and with as much accuracy as possible. On-line and 
run- time services, such as personalization services and information 
filtering services, have increased the importance of effective and 
efficient document categorization techniques. In the last years, a wide 
range of supervised learning algorithms have been applied to this problem. 
Recently, a new approach that exploits a two-dimensional summarization of 
the data for text classification was presented. This method does not go 
through a selection of words phase; instead, it uses the whole dictionary 
to present data in intuitive way on two-dimensional graphs. Although 
successful in terms of classification effectiveness and efficiency, this 
method presents some unsolved key issues: the design of the training 
algorithm seems to be ad hoc for the Reuters-21578 collection; the 
evaluation has only been done only on the 10 most frequent classes of the 
Reuters-21578 dataset; the evaluation lacks measure of significance in most 
parts; the method adopted lacks a mathematical justification. We focus on 
the first three aspects, leaving the fourth as the future work. (4 Refs) 
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The sudden expansion of the web and the use of the internet has caused 
some research fields to regain (or even increase) its old popularity. Of 
them, text categorization aims at developing a classification system 

for assigning a number of predefined topic codes to the documents based 
on the knowledge accumulated in the training process. We propose a 
framework based on an automatic inductive classifier, called ILA, for text 
categorization, though this attempt is not a novel approach to the 
information retrieval community. Our motivation are two folds. One is that 
there is still much to do for efficient and effective classifiers. The 
second is of ILA's (Inductive Learning Algorithm) well-known ability in 
capturing by canonical rules the distinctive features of text categories. 
Our results with respect to the Reuters 21578 corpus indicate (1) the 



reduction of features by information gain measurement down to 20 is 
essentially as good as the case where one would have . more features; (2) 
recall/precision breakeven points of our algorithm without tuning over top 
10 categories are comparable to other text categorization methods, namely 
similarity based matching, naive Bayes, Bayes nets, decision trees, 
linear support vector machines, steepest descent algorithm. 

English Descriptors: Text; Internet; Classification; On line; Information 
retrieval; Inductive learning; Learning algorithm; Information measure; 

Similarity ; Categorization; World wide web; Knowledge base; Recall; 
Steepest descent method; Vector support machine 
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Set Items Description 

51 793710 DISTANCE? ? OR SIMILARITY 

52 68892 S1(5N) (STORY OR STORIES OR ARTICLE? ? OR DOCUMENT? ? OR PR- 

ESS () RELEASE? ? OR CONTENT OR INFORMATION OR DATA OR NEWS OR - 
TEXT? ? OR CLIP? ? OR PAGE? ? OR WEBPAGE? ? OR BROADCAST? ? OR 
TELECAST? ?) 

53 835642 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 

??? OR CATEGORIZ? OR CATEGORIS?) (5N) (STORY OR STORIES OR ARTI- 
CLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? ? OR CONTENT OR INFO- 
RMATION OR DATA OR NEWS OR TEXT? ? OR CLIP? ? OR PAGE? ? OR W- 
EBPAGE? ? OR B 

54 136296 (BUFFER??? OR MEMORY OR RAM OR STACK OR QUEU????) (5N) (STORY 

OR STORIES OR ARTICLE? ? OR DOCUMENT? ? OR PRESS () RELEASE? ? 
OR CONTENT OR INFORMATION OR DATA OR NEWS OR TEXT? ? OR CLIP? 
? OR PAGE? ? OR WEBPAGE? ? OR BROADCAST? ? OR TELECAST? ?) 

55 21771 S1(5N) (VALUE? ? OR SCORE? ? OR NUMBER? ? OR NUMERAL? ? OR - 

FUNCTION? ?) 

56 1369 S5(5N) (SMALLER OR MINIMAL OR MINIMUM OR LEAST OR LOWEST OR 

LOWER OR BELOW OR ABOVE OR (LESS OR MORE) () (THEN OR THAN) OR - 
GREATER OR HIGHER OR LARGER OR BIGGER OR MAXIMUM OR THRESHOLD? 
?) 

57 235371 (SUMMARY OR SUMMARIES OR SUMMARIZ? OR SUMMARIS? OR ABSTRAC- 

T? OR SYNTHES? OR SYNOPSI?) (5N) (STORY OR STORIES OR ARTICLE? ? 
OR DOCUMENT? ? OR PRESS () RELEASE? ? OR CONTENT OR INFORMATION 
OR DATA OR NEWS OR TEXT? ? OR CLIP? ? OR PAGE? ? OR WEBPAGE? 

? OR BROADC 

58 13 588 (PREDETERMIN? OR PRESET? OR PREESTABLISH? OR PREDEFIN? OR - 

PREARRANGED OR PRESCRIBED OR (PREVIOUSLY OR PRE) () (DETERMIN? - 
OR SET???? OR ESTABLISH? OR DEFIN? OR ARRANGED)) (5N) (VALUE? ? 
OR SCORE? ? OR NUMBER? ? OR NUMERAL? ?) 

59 7 S2(50N)S3:S4(50N)S6 

510 7 RD (unique items) 

511 342 (DIVID? OR SEPARAT? OR PARTITION??? OR GROUP??? OR CLUSTER- 

??? OR CATEGORIZ? OR CATEGORIS?) (10N)S8 

512 0 S5(30N)S11 

513 338 (PREDETERMIN? OR PRESET? OR PREESTABLISH? OR PREDEFIN? OR - 

PREARRANGED OR PRESCRIBED OR (PREVIOUSLY OR PRE) () (DETERMIN? - 
OR SET???? OR ESTABLISH? OR DEFIN? OR ARRANGED)) (7N) (BUFFER? ? 
OR QUEUE? ?) 

514 0 S5(50N)S13 



S15 0 S2(50N)S13 
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LANGUAGE: English RECORD TYPE: Fulltext 

WORD COUNT: 599 LINE COUNT: 00057 

base . " 

PRIMUS Telecommunications Group, Incorporated is a global 
telecommunications company focused on providing domestic and international 
long- distance voice, data , private network and value -added services 
to more than 100,000 customers worldwide. Founded in 1994, PRIMUS today 
operates from headquarters in Vienna, Va., with over... 

...region. News and information are available on the Internet at 
http : //www . pr imustel . com . 

To receive additional information on PRIMUS Telecommunications 
Group , Incorporated via fax at no charge, dial 1- 800-PRO- INFO and enter 
code PRTL. 

Investors are . . . 
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GOAL PROGRAMMING FORMULATIONS FOR A COMPARATIVE ANALYSIS OF SCALAR NORMS 
AND ORDINAL VS. RATIO DATA 

Lee, Sang M; Olson, David L 
INFOR v42n3 PP : 163-174 Aug 2004 
ISSN: 0315-5986 JRNL CODE: IOR 
WORD COUNT: 42 84 

. . .TEXT: mining techniques include linear discriminant analysis and various 
forms of multiple criteria programming classification. Objectives used in 
data mining include maximization of minimum distances of data 
records from critical values , as well as separation of data records 
by minimizing the sum of deviations from critical values. These build upon 
the basic goal programming. . . 
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Targets and how to assess performance against them 

Walsh, Paul 
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WORD COUNT: 54 99 

...TEXT: relative to target must be considered as a separate exercise. The 
methods below only apply to performance data that can be categorised as 
business as usual, that is stable performance. Periods where shifts or 
trends are present must be considered as separate calculations. 



Case 1. Less than 30 points - counting and distance methods 



When the number of data points is less than 30, the methods are 
statistically unsophisticated. Two methods are presented, the counting and 
distance methods. 

The counting . . . 
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The changing careers of patients with chronic mental illness: A study of 
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. . .TEXT: input was to the cluster program. For both analyses, two- stage 
density linkage produced the most distinct clusters , although other 
clustering methods grouped the data in very similar ways. 

A five-cluster solution was chosen for the frequency data, based on peaks 



...of the clustering history. Note that in the SAS cluster program, fewer 
diagnostic statistics are available for distance data . Although a 
smaller number of clusters would seem to be better given N = 49, the 
clustering history showed that beyond eight, the largest... 
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An Investigation of Document Structures 

Shaw, William M. , Jr. 

Information Processing & Management v26n3 PP : 339-348 1990 
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ABSTRACT: Clustering is a useful tool in exploratory data analysis. The 
presence of clustering structure in a document collection and the 
influence of this presence on the success of cluster-based retrieval are 
investigated as a function of term-weight and similarity thresholds . 
The term-weight threshold selects a specific level of indexing 
exhaustivity for a document representation, while the similarity 
threshold specifies the level of the associated single-link hierarchy. 
Clear evidence for clustering structure is found. . . 
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ABSTRACT: The empirical significance of document partitions as a 
function of term-weight and similarity thresholds is investigated. 
The term-weight threshold selects a particular level of indexing 



exhaust ivity and specificity for the document representation. The 
similarity threshold selects a specific level of the related single-link 
hierarchy. The results demonstrate that the same... 

...These results represent the first step in an investigation designed to 
determine if the statistical relevance of document partitions can 
explain the empirical significance of the same partitions. ... 
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TEXT: 

Number 2 U.S. long- distance carrier WorldCom posted lower first 
quarter income as its declining long- distance telephone business 
offset its higher data , Internet and international revenues. 
WorldCom, which plans to create a tracking stock for its 
shrinking consumer and. . . 

...worldwide rose to $9.7 

billion, from $9.6 billion a year earlier. Revenues for the 
WorldCom group , which includes its data business, rose 12% to 
$6.1 billion. Of this, data and Internet services brought in $2.8... 



